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Abstract 

Equilibrium sampling of biomolecules remains an unmet challenge after more than 30 
years of atomistic simulation. Efforts to enhance sampling capability, which are reviewed 
here, range from the development of new algorithms to parallelization to novel uses of 
hardware. Special focus is placed on classifying algorithms — most of which are un- 
derpinned by a few key ideas — in order to understand their fundamental strengths and 
limitations. Although algorithms have proliferated, progress resulting from novel hardware 
use appears to be more clear-cut than from algorithms alone, partly due to the lack of 
widely used sampling measures. 

1 Introduction 

1 — 1 1.1 Why sample? 

Biomolecular behavior can be substantially characterized by the states of the system 
of interest — that is, by the configurational energy basins reflecting the coordinates of 
all constituent molecules in a system. These states might be largely internal to a single 
macromolecule such as a protein, or more generally involve binding partners and their 
relative coordinates. But regardless of the complexity of a system, the states represent 
key functional configurations along with potential intermediates for transitioning among the 
major states. The primary goal of equilibrium computer simulation is to specify configura- 
tional states and their populations. A more complete, mechanistic description would also 
include kinetic properties and dynamical pathways |147| . Nevertheless, even the equilib- 
rium description provides a basic view of the range of structural motions of biomolecules, 
and can also serve more immediately practical purposes such as for ensemble dock- 
ing (56}[65). 

The basic algorithm of biomolecular simulation, molecular dynamics (MD) simulation, 
has not changed substantially since the first MD study of a protein more than 30 years 
ago [82]. Although routine explicit-solvent MD simulations are now four or five orders of 
magnitude longer (i.e., 100 - 10 3 nsec currently), modern MD studies still appear to fall 
significantly short of what is needed for statistically valid equilibrium simulation (e.g., (36} 
38]). Roughly speaking, one would like to run a simulation at least 10 times longer than 
the slowest important timescale in a system. Unfortunately, many biomolecular timescales 
exceed 1 ms, and in some cases by orders of magnitude (e.g., (44)). 

Despite the outpouring of algorithmic ideas over the past decades, MD largely remains 
the tool of choice for biomolecular simulations. While this staying power partly reflects the 
ready availability of software packages, and perhaps some psychological inertia, it also 
is indicative of a simple fact: no other method can routinely and reliably "beat" MD by a 
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significant amount. 

This article will employ several points of view in considering efforts to improve sam- 
pling. First, it will attempt to define "the equilibrium sampling problem(s)" as precisely 
as possible, which necessarily includes discussing the quantification of sampling. Sig- 
nificant and substantiated progress is not possible without clear yardsticks. Secondly, a 
brief discussion of how sampling happens — i.e., how simulations generate ensembles 
and averages — will provide a basis for understanding numerous methods. Third, the 
article will attempt to review a fairly wide array of modern algorithms — most of which 
are underpinned by a surprisingly small number of key ideas. The algorithms and studies 
are too numerous to be reviewed on a case-by-case basis, but a bird's eye perspective is 
very informative; more focused attention will be paid to apparently conflicting statements 
about the replica exchange approach, however. Fourth, special emphasis will be placed 
on novel uses of hardware — GPUs, RAM, and special CPUs; this new "front" in sam- 
pling efforts has yielded rather clean results in some cases. Using hardware in new ways 
necessarily involves substantial algorithm efforts. 

This review is limited. At least a full volume would be required to comprehensively 
describe sampling methods and results for biomolecular systems. This review therefore 
aims primarily to catalog key ideas and principles of sampling, along with enough refer- 
ences for the reader to delve more deeply into areas of interest. The author apologizes for 
the numerous studies which have not been mentioned for reasons of space or because 
he was not aware of them. Other review articles (e.g., (4}|1§)) ancl books on simulation 
methods |2l [32l[62l[T05l are available. 

2 What is the sampling problem? 

While scientists actively working in the field of biomolecular simulation may assume 
the essential meaning of "the sampling problem" is universally accepted, a survey of the 
literature indicates that several somewhat different interpretations are implicitly assumed. 
What is meant by the sampling problem and, accordingly, success or failure in addressing 
the problem, surely will dictate the choice of methodology. 

2.1 A simple view of the equilibrium sampling problem 

Equilibrium sampling at constant temperature and fixed volume can be concisely be 
identified with the generation of full-system configurations x distributed according to the 
Boltzmann-factor distribution: 

p{x)<xexp[-U(x)/k B T\, (1) 

where p is the probability density function, U is the potential energy function, k B is Botlz- 
mann's constant, and T is the absolute temperature in Kelvin units. The coordinates x 
refer (in the classical picture assumed here) to the coordinates of every atom in the sys- 
tem, including solute and solvent. Note that other thermodynamic conditions, such as 
constant pressure or constant chemical potential, require additional energy-like terms in 
the Boltzmann factor |1 47] , but here we shall consider only the distribution <[T} for clarity 



and simplicity. 

Performing equilibrium sampling requires access to all regions of configuration space, 
or at least to those regions with significant populations, and also requires that configura- 
tions have the correct relative probabilities. This point will be elaborated on, below. 

2.2 Ideal sampling as a reference point 

In practice, it is nearly impossible to generate completely independent and identically 
distributed (i.i.d.) configurations — obeying Eq. ([T) — for biomolecules. Both dynamical 



2 



and non-dynamical methods tend to produce correlated samples p8l[41]|73|[86l|14H . Nev- 
ertheless, such ideal sampling is a highly useful reference point for disentangling compli- 
cations arising in practical simulation methods which generate correlated configurations. 

Consider a hypothetical example where an equilibrium ensemble of iV i.i.d. configu- 
rations has been generated according to Eq. ([1} for a biomolecule with a complex land- 
scape. Regions of configuration space with probability p > 1/N are likely to be rep- 
resented in the ensemble, and this is true regardless of kinetic barriers among states. 
Conversely, regions with p < 1/N are not likely to be represented, regardless of whether 
the region is "interesting." For instance, the unfolded state might or might not be an ap- 
propriate part of a size iV ensemble, depending on the system (i.e., on the free energy 
difference between folded and unfolded states) and on N. See Fig. [Tjand Sec. |2.3.1 
below. 




Configurational coordinate 



Figure 1 : A schematic energy landscape of a protein. Both the folded states and unfolded 
states generally can be expected to consist of multiple sub-states, /■ and u u respec- 
tively. Transitions among sub-states themselves could be slow, compared to simulation 
timescales. The "sampling problem" is sometimes conceptualized to involved generating 
configurations from both folded and unfolded states: the text discusses when this view is 
justified. 

2.3 Goals and ambiguities in equilibrium sampling 

The question, "What should be achieved in equilibrium sampling?" can be divided 
into at least three questions. First, for a given problem (defined by the specific system 
and a specified initial condition), there is the question of defining success — When is 
sampling sufficient? Usually, the goal of equilibrium simulation is to calculate observables 
of interest to a level of precision sufficient to draw physical conclusions. Yet the variables 
"of interest" will vary from study to study. It therefore seems more reasonable to set a 
goal of generating an ensemble of configurations from which arbitrary observables can 
be calculated. In rough terms, let us therefore assume that the goal is to calculate iV eff 
effectively independent configurations, where presumably it is desirable to achieve N eS > 
10. Larger value s may often be necessary, however, because fractional uncertainty will 
vary as l/\/N cS for slower-to-converge properties such as state populations. This issue 
will be addressed further below. 

The other questions are probably the most pressing, starting with, "What type of sys- 
tem is to be sampled?" A basic distinction that seems to appear implicitly in the literature is 
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whether the unfolded state is important in the system or not. In this context, "importance" 
would usually be defined by whether or not a system exhibits a significant equilibrium un- 
folded fraction — say, roughly equal probabilities p(folded) ~ p(unfolded). Indeed, some 
evaluations of sampling methods have specifically targeted such a balance (48j[T02j, e.g., 
by adjusting the temperature. 



A third question about initial conditions is addressed separately, below, in Sec. |2.3.1 
These questions arise in the first place because all available practical sampling meth- 
ods for biomolecules are "imperfect" in the sense of producing correlated configurations. 



In ideal equilibrium sampling (Sec. by contrast, the sufficiency of sampling is fully 



quantified in terms of the number N of independent configurations, and can be assessed 
objectively in terms of observables of interest. The notion of an initial condition is irrele- 
vant in ideal sampling since there are no correlations. Also, the values of p(folded) and 
p(unfolded) have no effect on sampling quality, which is fully embodied in N; instead, it 
is iV which determines what states are represented in the ensemble and the appropriate 
frequencies of occurrence. 

2.3.1 Should initial conditions matter? Issues surrounding the unfolded state 

A typical simulation of a protein can be described as one intended to sampled the 
folded state, which may consist of a number of substates as in Fig. [T| So long as the 
simulation is initiated in or near one of these substates, the resources required to sample 
it will not depend significantly on the initial conditions. Rather, timescales for transitioning 
among substates will dominate the computing cost. In a dynamical simulation, after all, 
it is the transitions among substates which yields estimates for substate populations and 
hence the equilibrium ensemble: see Sec.|3/T 



Sensitivity to initial conditions can become important when the unfolded state is in- 
volved. Even when the goal is to sample the equilibrium ensemble of an overwhelmingly 
folded protein, the initial condition can come into play if it introduces an anomalously long 
timescale. Consider beginning an equilibrium simulation of the landscape of Fig. [T] in the 
unfolded substate labeled u 6 . Depending on details of the system, the time required to 
find the folded state could easily exceed the time for sampling folded substates. Thus, 
such a simulation, first has to solve a search problem before sampling is begun. 

We can set our discussion in the context the folding free energy, defined based on the 
ratio of probabilities for unfolded and folded states: 

p(folded)/p(unfolded) = exp ( -AG Md / k B T) . (2) 

Consider a value AG fo i d ~ -3kcal/mol ~ -5k B T, which implies the unfolded state is occu- 
pied 1 % of the time or less. In ensembles with fewer than 1 00 independent configurations 
{N eS < 100), such an unfolded state typically should not be represented. In practical 
terms, because key folded-state timescales often exceed ^sec or even msec, it is unlikely 
that a simulation of a protein in folding conditions in the current era will contain a sufficient 
number of independent configurations for the unfolded state to be represented. 

Note that we have assumed it is straightforward to distinguish folded and unfolded 
configurations. In reality, there may be a spectrum of disordered states in many systems, 
and hence ambiguity in defining folded and unfolded states. 

3 Sampling basics: Mechanism, timescales, and cost 

3.1 Transitions provide the key information in sampling 

The problem of equilibrium sampling can be summarized as determining the metastable 
states and their relative populations, but how are populations actually determined by the 
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simulation? In almost every practical algorithm, the information comes from transitions among 
the states. By switching back and forth between pairs of states, with dwell times between 
transitions acting as proxies for rate constants k, a dynamical simulation gathers infor- 
mation about relative populations based on the equilibrium balance relation p^ = Pjkji 
1 147]. With more transitions, the error in the population estimates pi decreases [102]. 



Without transitions, most algorithms have no information about relative populations. Con- 
sider the case of two independent simulations started from different states which exhibit 
no transitions: determining the populations of the states then requires more advanced 
analysis [136] which often may not be practical. 

It is critical realize that most "advanced algorithms" — such as the varieties of ex- 
change simulation (Sec. [5} — also require "ordinary" transitions to gather population in- 
formation. That is, transitions among states within individual continuous trajectories are 
required to obtain sampling, as has been discussed before |38"[|102[ |T43|. Any algorithm 



that uses dynamical trajectories as a component can be expected to require transitions 
among metastable states in those trajectories. 

It must also be pointed out that transitions themselves do not necessarily equate to 
good sampling. An example is when a high temperature is used to aid sampling at a 
lower temperature. Although transitions are necessary, they are not sufficient because of 
the overlap issue: the sampled configurations ultimately must be important in the targeted 
(e.g., lower temperature) ensemble after they are reweighted |72| , |109| . 

3.2 Timescales of sampling 

Much of the foregoing discussion can be summed up based on two key timescales. 
(Even if a sampling method is not fully dynamical, analogous quantities can be defined 
in terms of computing times.) The timescale that typically is limiting for simulations can 
be denoted as t* orr which denotes the longest "important" equilibrium correlation time. 
Roughly, this is the time required to explore all the important parts of configuration space 
once (73) — starting from any well populated (sub)state — and is elaborated on below. 
For sampling the landscape of Fig. Q] under folding conditions, this would be the time 
to visit all folded substates starting from any of the folded basins. However, as also 
discussed above, initial conditions may play an important role, and thus we can define 
t init as the "equilibration" time (or "burn-in" time in Monte Carlo lingo). This is not an 
equilibrium timescale, but the time necessary to relax from a particular non-equilibrium 
initial condition to equilibrium. Thus, £ init is specific to the initial configuration of each 
simulation. The landscape of Fig. [T] under folding conditions suggests £ init < £* orr for a 
simulation started from a folded substate, but it is possible to have significant values — 
even t init > t* orr — when starting from an unfolded state. 

3.3 Factors contributing to sampling cost 

A little thought about the ingredients of sampling can help one understand successes 
and failure of various efforts — and more importantly, aid in planning future research. The 
most important ingredient to understand is single-trajectory sampling — i.e., the class of 



dynamical methods discussed in Sec. |4.1 [ including MD, single-Markov-chain Monte Carlo 
(MC), or any other method that generates an ensemble as a sequentially correlated list of 
configurations. Single-trajectory methods tend to be the engine underlying traditional and 
advanced methods. 

The sampling cost of a trajectory method can be divided into two intuitive factors: 

Trajectory sampling cost = (Cost per trajectory step) x (Number of steps needed for sampling) 

~ (Cost of energy call) x (Roughness of energy landscape) . (3) 
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In words, a given algorithm will require a certain number of steps to achieve good sam- 
pling (e.g., iV eff > 1) and the total cost of generating such a trajectory is simply the total 
cost per step multiplied by the required number of steps. Although this decomposition is 
trivial, it is very informative. For example, if one wants to use a single-trajectory algorithm 
(e.g., MD or similar) where a step corresponds roughly to 10~ 15 sec, then 10 9 - 10 12 steps 
are required to reach the usee - msec range. Therefore, unless the cost of a step can be 
reduced by several orders of magnitude, MD and similar methods likely cannot achieve 
sampling with typical current resources. In a modified landscape (e.g., different temper- 
ature or model) the roughness may be reduced — but, again, that reduction should be 
several orders of magnitude, or else be accompanied by a compensating reduction in the 
cost per step. Among noteworthy examples, increasing temperature decreases rough- 
ness but not the cost per step, whereas changing a model (i.e., modifying the potential 
energy function U) can affect both roughness and the step cost. Both strategies sample 
a different distribution, which may be nontrivial to convert to the targeted ensemble. 

The importance of understanding dynamical trajectory methods via Eq. ^ is under- 
scored by the fact that most multi-level (e.g., exchange) simulations require good sampling 
at some level |143[|148 ], by single continuous trajectories. 

4 Quantitative assessment of sampling 

Although the challenge of sampling biomolecular systems has been widely recognized 
for years, and although new algorithms are regularly proposed, systematic quantification 
of sampling has often not received sufficient attention. Most importantly, there does not 
seem to be a widely accepted (and widely used) "yardstick" for quantifying the effective- 
ness of sampling procedures. Given the lack of a broadly accepted measure, it is tempting 
for individual investigators to report measures which cast the most favorable light on their 
results. 

Two key benefits would flow from a universal measure of sampling quality. As an ex- 
ample, assume we are able to assign an effective sample size, iV off , to any ensemble 
generated in a simulation; N cG would characterize the number of statistically indepen- 
dent configurations generated. Sampling efficiency could then be quantified by the CPU 
time (or number of cycles) required per independent sample. The first benefit is that we 
would no longer be able to fool ourselves or others as to the efficiency of a method based 
on qualitative evidence or perhaps the elegance of a method. Thus, the field would be 
pushed harder to focus on a bottom-line measure. Secondly, describing the importance 
of a new algorithm would be more straightforward. It would no longer be necessary to 
directly compare a new method to, perhaps, a competitor's approach requiring subtle 
optimization. Instead, each method could be compared to a standard (e.g., molecular dy- 
namics), removing ambiguities in the outcome. It would still be important to test a method 
on a number of systems (e.g., small and large, stiff and flexible, implicitly and explicitly 
solvated), but the results would be quantitative and readily verified by other groups. On a 
related note, there has been a proposal to organize a sampling contest or challenge event 
to allow head-to-head comparison of different methodologies (B.R. Brooks, unpublished). 



Numerous ideas for assessing sampling have been proposed over the years (e.g., [36 



3^ [4Tl[73l[8^|8^[TT4l[TT^[T4T ] ) , but it is important to divide such proposals into "absolute" 



and "relative" measures. Absolute measures attempt to give a binary indication of whether 
or not "convergence" has been achieved, whereas relative measures estimate how much 
sampling has been achieved — e.g., by an effective sample size N cK . The perspective of 
the present article is that absolute measures fail to account for the fundamental statistical 
picture underlying the sampling problem. To see why, note that any measured observable 
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will have an uncertainty associated with it; roughly speaking, sampling quality is reflected 
in typical sizes of error bars which decrease with better sampling. There does not seem to 
an unambiguous point where sampling can be considered absolutely converged (although 
a simulation effectively does not begin to sample equilibrium averages until t mit has been 
surpassed). 

It is certainly valuable, nevertheless, to be able to gauge when sampling is wholly 
inadequate, in an absolute sense. Absolute methods which may be useful at detecting 
extremely poor sampling include casual use of the "ergodic measure" [86] (i.e., whether 
or not it approaches zero) and cluster counting |1 14| . 

4.1 Assessing dynamical sampling 

The assessment of dynamical methods illustrates the key ideas behind the relative 
measures of sampling. For this purpose, "dynamical" will be defined to mean when tra- 
jectories consist of configurations with purely sequential correlations — i.e., where a given 
configuration was produced based solely on the immediately preceding configuration(s). 
Thus, MD, Langevin dynamics, and simple Markov-chain MC are dynamical methods, but 
exchange algorithms are not [147| . Correlation times or their analogs for MC can then 
readily be associated with any trajectory which was dynamically generated. Assuming, 
for the moment, that there is a fundamental correlation time for overall sampling of the 
system, t* orr , then an effective sample size can be calculated from the simple relation 

iV eff ~ t sim /C rr , (4) 

where t sim is the total CPU time (or number of cycles) used to generate the trajectory. 
Thus, sampling quality is relative in that it should increase linearly with t sim (86). In rough 
terms, the absolute lack of sampling would correspond to N cS < 1; physically, this would 
indicate that important parts of configuration have been visited only once — e.g., a simu- 
lation shorter than t- mit + t* mr . (It would seem impossible to detect parts of the space never 
visited.) 

Recent work by Lyman and Zuckerman has suggested that a reasonable overall cor- 
relation time t* orr can be calculated from a dynamical trajectory (73). Those authors de- 
rived their correlation time from the overall distribution in configuration space, estimating 
the time that must elapse between trajectory frames so that they behave as if statisti- 
cally independent. The approach has the twin advantages of being based on the full 
configuration-space distribution (as opposed to isolated observables) and of being blindly 
and objectively applicable to any dynamical trajectory. Other measures meeting these 
criteria would also be valuable; note, for instance, the work by Hess using principal com- 
ponents (41") . 

4.2 Assessing non-dynamical sampling 

If sampling is performed in a non-dynamical way, one cannot rely on sequential corre- 
lations to assess sampling as in Eq. Q. Many of the modern algorithms which attempt to 
enhance sampling, such as those reviewed below, are not dynamical. Replica exchange 
(RE) (34pTpT8 | is a typical example: if one is solely interested in the ensemble at a single 
temperature, a given configuration may be strongly correlated with other configurations 
distant in the sequence at that T and/or be uncorrelated with sequential neighbors. At the 
same time, RE does generate trajectories which are continuous in configuration space, 
if not temperature, and it may be possible to analyze these in a dynamical sense (73) 
but care will be required [14|. Other algorithms, for instance based on polymer growth 



procedures [29y35l|126[|138|-|140|, are explicitly non-dynamical. 



Two recent papers |3~8~1|141| have argued that non-dynamical simulations are best as- 
sessed by multiple independent runs. The lack of sequential correlations — but the pres- 
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ence of more complex correlations — in non-dynamical ensembles means that the list 
of configurations cannot be divided into nearly independent segments for blocking-based 
analysis. Zhang et al. suggest that multiple runs be used to assess variance in the pop- 
ulations of physical states [141| for two related reasons: (i) such states are defined to 
be separated from one another by the slowest timescales in a system; (ii) relative popu- 
lations of states cannot be properly estimated without good sampling within each state. 
State-population variances, in turn, can be used to estimate N cS based on simple sta- 
tistical arguments [141]; nevertheless, it is important that states be approximated in an 
automated fashion — see | pT3| , [9T][T4T] | — to eliminate the possibility for bias. 

5 Purely algorithmic efforts to improve sampling 

How can we beat MD? Despite 30 years of effort, there is no algorithm that is signif- 
icantly more efficient than molecular dynamics for the full range of systems of interest. 
Further, although dozens of different detailed procedures have been suggested, there are 
a limited number of qualitatively distinct ideas. At attempt will now be made to describe 
some of the strategies which have been proposed. 

5.1 Replica exchange and multiple temperatures 

The most common strategy is to employ elevated temperature. Many variations on 
this strategy have been proposed, with one of the earliest suggestions being in the con- 
text of spin systems (27). The approach that has been applied to biomolecules most 
often is replica exchange (RE, also called parallel tempering) in which parallel simulations 
wander among a set of fixed temperature values with swaps governed by a Metropolis 
criterion |34| [5T| , [TT8] . Many variations and optimizations have been proposed for RE 
(e.g., ]8l[TT][2^[57|[64l[67l[93}^[9 ^ Closely related to RE is simu- 

lated tempering, in which a single trajectory wanders among a set of temperatures [75]; 
once again, optimizations have been proposed (e.g., (137| ). See Fig. [2} 

Less well known, but formally closely related to exchange and tempering, is "annealed 
importance sampling" (AIS), in which a high-temperature ensemble is annealed to lower 
temperatures in a weighted way that preserves canonical sampling (49|[87). See the 



schematic representation in Fig. [2} AIS has been applied to biomolecules and optimiza- 
tions have been suggested (72j[74|. AIS is nominally a non-equilibrium approach, but in 
precise analogy to the Jarzynski equality (52), it yields equilibrium ensembles. The "J- 
walking" approach also starts from high temperature [28]; a good discussion of several 
related methods is given in Ref. (9). 

5.1 .1 How effective is replica exchange? 

Replica exchange (RE) simulation is both popular and controversial. Some authors 
have noted weaknesses p"8| , [92~|[T4~8~) , others have described successes (e.g., [ [48 



102 



104| ), and some conclusions have been more ambiguous (3j[76j|96|[T43). What is the real 
story — how much better is RE than ordinary MD? 

Examination of the various claims and studies reveals that, in fact, there is little dis- 



agreement so long as the particular sampling problem is explicitly accounted for (Sec. 2.3 



In brief, when observables of interest depend significantly on states which are very rapidly 
accessed at high temperature, then replica exchange and related methods can be very ef- 
ficient |48[|102| . A prime example would be estimating the folded fraction near the folding 
transition. On the other hand, when the goal is to sample an overwhelmingly folded en- 
semble, there does not appear to be significant evidence of replica exchange's efficiency 
— so long as the initial condition is a folded configuration (i.e., in terms of the timescales 



of Sec. |3j2) if t 



kinit 



< t* orr ). This last qualification is important in understanding the data 
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Figure 2: Many ways to use many levels. Different horizontal levels represent different 
simulation conditions, such as varying temperature, model parameters, and/or resolution. 
A given ladder or set of conditions can be used in different formulations: (a) exchange, 
(b) tempering, or (c) annealing. 



presented in Ref. (48] . 

An intuitive picture underpinning the preceding conclusions is readily gained by re- 
visiting the idea that state populations are estimated by transitions between states in 
continuous trajectories (Sec. 



3.1 



Indeed, this same picture is the basis for discrete-state 
analyses [ |102[|143| , which we can consider along with the schematic landscape of Fig. 
[T] Specifically, the average of any observable A can be written as (A) = ^PiAu where 
i indexes all folded and unfolded substates and A { is the average of A within substate %. 
If pi values are significant in both folded and unfolded substates, then the elevated tem- 
peratures employed in RE simulation can usefully promote transitions to sample unfolded 



■> folded 



states [1 02]. On the other hand, if p(unfolded) < p(folded), where p(folded) = J2T"""P> ls 



the sum of folded probabilities and p(unfolded) = 1 - p(folded), then computer resources 
devoted to unfolded substates can detract from estimating folded-state observables; this 
line of analysis echoes earlier studies f92~l|143) . 

Regardless of whether RE is superior to MD, another key issue is whether RE can 
provide sufficient sampling {N cS > 10) for protein-sized systems. It is far from clear that 
this is the case. RE cannot provide full sampling unless some level of the ladder can 
be fully sampled [148] (see Sec. |3J}, and the ability for a simple dynamical trajectory to 
sample the large configuration space of a fully or partly unfolded state of a protein has not 
been demonstrated to date. 

5.1.2 Energy-based sampling methods 

Energy-based schemes must be seen as closely related to temperature-based strate- 
gies since the two variables are conjugate to each other in the Boltzmann factor, Eq. 
([1}. Yet because any fixed-temperature ensemble can include arbitrary energy values, 
energy-based schemes must account for this. "Multi-canonical" schemes based on sam- 
pling energy values (typically uniformly) have been implemented for biomolecules [40,85], 
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including with the Wang-Landau device [132]; see also (55). Nevertheless, it should be 
remembered that energy is not expected to be a good proxy for the configurational coor- 
dinates of primary interest — see Fig. [Tj 

5.2 Hamiltonian exchange and multiple models 

As schematized in Fig. |2j the different temperature-based methods sketched above 
can readily be generalized to variations in forcefield parameters [1 17| , and even resolution 
as discussed below. This is because the methods are based on the Boltzmann factor, 
which is a general form that can apply to different forcefields U x , each characterized by 
the set of parameters A = {Ai, A 2 , . . .}. Thus, the fundamental distribution Eq. ([Q can be 
written more explicitly as: 

p(x;T,A) ocexp[-C/ A (x)/fc B T], (5) 

The different schemes for using ladders of different types — based on temperature, force- 
field parameters, or resolution — are depicted in Fig. [2} One of the first proposals for 
using multiple models was given in Ref. [69) . 

The simplest way to reduce the roughness of a landscape (cf. Eq. ([3)) is by maintaining 
the functional form of U but changing parameters — e.g., coefficients of dihedral terms. 
Changing parameters in parameter files of common software packages makes this route 
fairly straightforward. Note that unless forcefield terms are explicitly removed, the step 
cost in Eq. ([3) remains unchanged. 

Hamiltonian exchange has been applied in a number of cases, for instance using 
models with softened van der Waals' interactions and hence decreased roughness [45]. 
The approach called "accelerated MD" employs ordinary MD on a smoothed potential 
energy landscape and requires reweighting to obtain canonical averages J39| , [T09 |; it can 
therefore be considered a two-level Hamiltonian-changing algorithm. 

5.3 Resolution exchange and multi-resolution approaches 

Extending the multiple-model ideas a step further leads to the consideration of multiple 
resolutions. To account for varying resolution in the formulation of Eq. (|5}, some of the Ai 
parameters may be considered prefactors which can eliminate detailed interactions when 
set to zero. Different, formally exact approaches to using multiple resolutions have been 



proposed (T5][^[70][7T[[T4 5 ] , primarily echoing ideas in replica exchange and annealing. 

Although multiple-resolution methods have not produced dramatic results for canonical 
sampling of large atomistic systems, they appear to hold unique potential in the context of 
the sampling costs embodied in Eq. ||3). In particular, low-resolutions models reduce cost 
and roughness to such an extent that some coarse models ha ve be en shown to permit 
good sampling for proteins {N eS > 10) with typical resources J77||135| . It is not clear that a 
similar claim can be made for high-temperature or modified-potential simulations of stably 
folded proteins. 

5.4 Multiple trajectories without multiple levels 

There is a qualitatively different group of algorithms that also uses multiple trajectories, 
but ajl under the same conditions — i.e., "uni-canonically" (6|[47J[89) — in contrast to ex- 
change and related methods. The basic idea of these methods is that when a trajectory 
reaches a new region of configuration space, it can spawn multiple daughter trajecto- 
ries, all on the unaltered landscape. The daughter trajectories enable better sampling 
of regions that otherwise would receive less computer resources — and also, resources 
can be directed away from regions which already are well sampled. In the context of 
Markov-state models, this strategy can be optimized to increase precision in the esti- 
mate of any observable (42). Exact canonical sampling can also be using a steady-state 
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formulation of the "weighted ensemble" path sampling method [6|; see also related ap- 
proaches |98l pf25|pf35) . 

5.5 Single-trajectory approaches: Dynamics, Monte Carlo, and vari- 
ants 

It is useful to group together a large group of approaches — including MD, simple 
Monte Carlo (MC) and Langevin dynamics J2|[32}[105] — and consider them as basic 
dynamics [147]. These methods generate a single trajectory (or Markov chain) in which 
configurations are strongly sequentially correlated — and have no non-sequential corre- 
lations. In other words, these methods should all be expected to behave roughly like MD 
simulation because the motion is intrinsically constrained by the high density and land- 
scape complexity of biomolecular systems. On the one hand, basic dynamical methods 
will generate physically realizable trajectories or nearly so, but on the other, they will be 
severely limited by physical timescales as described above. 

In this section, we will explore some basic ideas of dynamical approaches and also 
some interesting variants. 

5.5.1 Are there "real" dynamics? 

Unlike many other sampling algorithms, MD also simulates the fundamental classical 
dynamics of a system. That is, the trajectories produced by MD are also intended to 
model the time-dependence of physical processes. 

Should MD trajectories provide much better depictions of the underlying processes 
than other "basic" simulation methods — e.g., Langevin or MC simulation? The answer 
is not so clear. First, like other methods, MD necessarily uses an approximate forcefield 
(see, e.g., [31]). Second, on the long timescales ultimately of interest, roundoff errors 
can be expected to lead to significant deviations from the exact trajectories for the given 
forcefield. 

Somewhat more fundamentally, there appears to be a physical inconsistency in run- 
ning finite-temperature MD simulations for finite-sized systems. Thermostats of various 
types are used [32], some stochastic and some deterministic: is there a correct method? 
By definition, a finite-temperature system is not isolated but is coupled to a "bath" by some 
physical process, presumably molecular collisions. Because the internal degrees of free- 
dom of a bath are not explicitly simulated, again by definition, a bath should be intrinsically 
stochastic. The author is unaware of a first-principles prescription for modeling the cou- 
pling to a thermal bath, although sophisticated thermostats have been proposed [32,63]. 
For MD simulations with periodic boundary conditions, it is particularly difficulty to imagine 
truly physical coupling to a bath. 

The bottom line is that a finite-temperature process in a finite system is intrinsically 
stochastic, rendering questionable the notion that standard MD protocols produce "cor- 
rect" trajectories. To consider this point another way, should we really expect that inertial 
effects are very important in dense aqueous and macromolecular media? 

The foregoing discussion suggests, certainly, that Langevin simulations should also 
be considered physical, but what about Monte Carlo? Certainly, one can readily imagine 
unphysical MC protocols (e.g., attempting trial moves for atoms in the C-terminal domain 
of a protein ten times as often as for those in the N terminus). Nevertheless, so long 
as trial moves are small and performed in a spatially uniform way, one can expect MC 
"dynamics" to be an approximation to over-damped (non-inertial) Langevin simulation; 
see |9Q| [TTQ|[T46l . 
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5.5.2 Modifying dynamics to improve sampling 

Although changing temperature or model parameters automatically changes the dy- 
namics observed in MD or LD simulation, it has been shown that the dynamics can be 
modified and maintain canonical sampling on the original landscape, U(x) f8"4]|144| . This 
possibility defies the simple formulation of the second line of Eq. (|3), because the land- 
scape roughness remains unchanged while the number of required steps is reduced. The 
method has been applied to alkane and small-peptide systems, but its generalization to 
arbitrary systems has not been presented to the author's knowledge. A better known 
and related approach is the use of multiple time steps, with shorter steps for faster de- 
grees of freedom [ |1 19[|122| , which can save computer time up to a limit set by resonance 
phenomena Q1 06j . 

Another approach to modifying dynamics is to avoid revisiting regions already sampled 
by means of a "memory" potential which repels the trajectory from prior configurations [50, 
|60~) . This strategy is called "metadynamics" or the local-elevation method, and typically 
employs pre-selection of important coordinates to be assigned repulsive terms. Although 
trajectories generated in a history-dependent way do not satisfy detailed balance, it is 
possible to correct for the bias introduced and recover canonical sampli ng ]1"0~) . This 
approach could also be considered a potential-of-mean-force method (Sec. |5.6|> 

The potential surface and dynamics can also be modified using a strategy of raising 
basins while keeping barriers intact [39}[97J[T29); see also [149| . The "accelerated MD" 
approach is a well-known example, but achieving canonical sampling requires a reweight- 
ing procedure [;109|. Such reweighting is limited by the overlap issues confronting many 



methods: the sampled distribution must be sufficiently similar to the targeted distribution 
Eq. ([1} so that the data is not overwhelmed by statistical noise |72[|109| . 

Qualitatively, how can we think about the methods just described? For one thing, as 



dynamical methods, they all gather information by transitions among states (Sec. [37l 
more transitions suggests greater statistical quality of the data. However, methods which 
modify the potential surface may increase the number of transitions, but the overlap be- 
tween the sampled and targeted distributions generally is expected decline with more 
substantial changes to the potential. In this context, the modification of dynamics with- 
out perturbing the potential J84||1 44| seems particularly intriguing, even though technical 
challenges in implementation may remain. 

5.5.3 Monte Carlo approaches 

The term "Monte Carlo" can be used in many ways, so it useful to first delimit our 
discussion. Here, we want to focus on single-Markov-chain Monte Carlo, in which a se- 
quence of configurations is generated, each one based on a trial move away from the 
previous configuration. Typically, such MC simulations of biomolecules have a strong 
dynamical character because trial moves tend to be small and physically realizable in a 
small time increment f90j|1 10[[1"46) , whereas large trial moves would tend to be rejected 
in any reasonably detailed model. As we shall see, however, less physical moves some- 
times can be used — including move sets which do not strictly obey detailed balance but 
instead conform to the weaker balance condition [79]. It is also worth noting that the var- 
ious exchange simulations discussed above (Sees. [STTj [572| [5T3] > can be considered MC 
simulations because "exchange" is a special kind of trial move and indeed governed by a 
Metropolis acceptance criterion |147) . 

A key advantage of single-chain MC simulation is that it can be used with any energy 
function, whether continuous or not, because forces do not need to be calculated [32]. 
MC simulation is available for use with standard forcefields in some simulation packages 
|4~6p3) but it seems to be most commonly used in connection with simplified models (e.g., 
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|1 10[|146| ). The choice to use MC is usually for convenience: it readily enables the use of 



simple or discontinuous energy functions; furthermore, MC naturally handles constraints 
like fixed bond lengths and/or angles. Substantial effort has gone into developing trial 
moves useful for biomolecules, with a focus on moves that act fully locally (e.g., j5"l |1 13[ 
123[|127[[1"3~4) ). Two studies of MC employing quasi-physical trial moves have reported 



efficiency gains compared to MD [53, 124]. MC has also been applied to large implicitly 



solvated peptides [80]. Further, it is widely used in non-statistical approaches such as 
docking [62], and it also is used in the Rosetta folding software fT7) . 

For implicitly solvated all-atom peptides, a novel class of trial moves has proven ex- 
tremely efficient. Ding et al. showed that when libraries of configurations for each amino 
acid were computed in advance, swap moves with library configurations provided ex- 
tremely rapid motion in configuration space (T9). Efficiency gains based on measuring 
N eS suggested the simulations were 100 - 1,000 times faster than MD. Trial moves in- 
volving entire residues apparently were successful because subtle correlations among 
the atoms were accounted for in the pre-computed libraries; by comparison, internal- 
coordinate trial moves of a similar magnitude (e.g., twisting <p and x/j dihedrals) were rarely 
accepted (T9). 

More detailed discussion of Monte Carlo methods can be found in recent reviews 



[22,128] and textbooks [2,32|. 



5.6 Potential-of-mean-force methods 

Many methods are designed to calculate a potential of mean force (PMF) for a speci- 
fied set of coordinates and such approaches implicitly are sampling methods. After all, 
the Boltzmann factor of the PMF is defined to be proportional to the probability distribu- 
tion for the specified coordinates: p(x) oc exp[-PMF(x) /k B T) ]147| . If the PMF has been 



calculated well (with sufficient sampling along the x coordinates and orthogonal to them) 
then the configurations can be suitably reweighted into a canonical ensemble. 

There are numerous approaches geared toward calculating a PMF. Some explore the 



full configuration space in a single trajectory, such as lambda dynamics [ |58| , |120| , and 
could be categorized with modified dynamics methods, although this is just a semantic is- 
sue. A large fraction of PMF calculations employ the weighted histogram analysis method 
(WHAM) (59), but many alternatives have been developed (e.g., |fT[ |23t[8T[[TTT][T42"l ). 

The main advantage which PMF methods hope to attain is faster sampling along the x 
coordinates than would be possible using brute-force sampling. This perspective allows 
several observations. First, to aid sampling, the investigator must be able to select ajl 
important slow coordinates in advance — otherwise, sampling in directions "orthogonal" 
to x will be impractical. Second, the maximum advantage which can be gained over a 
brute-force simulation will depend on how substantial the barriers are along the x coor- 
dinates; if sampling along a coordinate is slow because many states are separated by 
only small barriers, then the advantage may be modest. Last, for a PMF calculation to 
be successful, different values of the x coordinates must be quantitatively related to each 
other: depending on the details of the method, this can be directly via numerous transi- 
tions (e.g., J2~5| , [58~j|120[|l42 |) echoing Sec.|3.1[ or by requiring well-sampled sub-regions 
of overlap [59]. 

5.7 Non-dynamical methods 

Methods which do not rely on dynamics for sampling use a variety of distinct strategies, 
but as there are relatively few such efforts, they are grouped together for convenience. 
Recently, there have been a number of applications to biomolecules of old polymer 



growth ideas (e.g., [83 131]), sometimes termed "sequential importance sampling" [66]. 
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The basic idea is to add one monomer (e.g., amino acid) at a time to an ensemble of par- 
tially grown configurations while (i) keeping track of appropriate statistical weights and/or 
(ii) by "resampling" (66). These approaches have been applied to simplified models of 
proteins and nucleic acids (2~9||138j -[140|, to all-atom peptides at high temperature [126], 
and more recently to all-atom peptides using pre-sampled libraries of amino-acid con- 
figurations 1 78]. The intrinsic challenge in polymer growth methods is that configurations 
important in the full molecule may have low statistical probability in early stages of growth; 
thus application to large, detailed systems likely will require biasing toward structural in- 
formation known for the full molecule J78~l|138| . 

Some methods have been developed which attempt a semi-enumerative description of 
energy basins. For instance, the "mining minima" method uses a quasi-harmonic proce- 
dure to estimate the partition function of each basin, which in turn determines the relative 
probability of the basin (23); see also \7\. While such approaches have the disadvantage 
of requiring an exponentially large number of basins [ 1 30], they largely avoid timescale 
issues associated with dynamical methods because the basins are located with faster 
search methods. 

Another class of approaches has attempted to treat the problem of estimating the free 
energy of a previously generated canonical ensemble (43j|54j[T36|; see also (T2). In anal- 
ogy to PMF methods (Sec.|5.6|>, knowledge of free energies for independent simulations 



in separate states (for which no transitions have been observed) enables the states to be 
combined in a full ensemble: the free energies directly imply the relative probabilities of 
the states. The ideas behind such methods are fairly technical and beyond the scope of 
this review. 

6 Special hardware use for sampling 

In several cases, rather impressive improvements in sampling speed have been re- 
ported based on using novel hardware — and based on novel uses of "ordinary" hard- 
ware. It is almost always the case that using new hardware also requires algorithmic 
development, and that component should not be minimized. Little is "plug and play" in 
this business. 

6.0.1 Parallelization, special-purpose CPUs, and distributed computing 

Perhaps the best known way to employ hardware is by parallelization. On the one 
hand, parallelization typically makes sampling less efficient than single-core simulation 
when measured by sampling per core (or per dollar) due to overhead costs. On the other, 
parallelization allows by far the fastest sampling for a given amount of wall-clock time (for 
a single system) including record-setting runs |21][30|[T08 |. Recent examples of paral- 
lelization of single-trajectory MD include: a 10 usee simulation of a small domain [30]; 
/isec+ simulations of membrane proteins on the BlueGene (37) and Desmond (20) plat- 
forms; and the longest reported to date, a msec simulation of a small globular protein on 
the Anton machine ]108| ; see also (24). The Anton simulation reflects both parallelization 
and the use of special-purpose chips, which can be inferred to each provide ~10 or more 
times speedup compared to standard chips |107[|108|. 



Long simulations have significant value. They allow the community to study selected 
systems in great detail and to appreciate phenomena which could not otherwise be ob- 
served. Of equal importance, long simulations can alert the community to limitations of 
MD and forcefields |31[|36]. 

There are other parallel strategies, such as exchange simulation (see above) and dis- 
tributed computing. Distributed computing employs many simultaneous independent sim- 
ulations or — via repeated rounds of simulation — quasi-parallel computing with minimal 
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communication among simulations. While distributed computing has primarily been ap- 
plied to the folding problem [115] , recent work has shown the value of multiple short 
simulations for producing Markov state models (47|[89). Such models can be used to 
deduce both non-equilibrium and equilibrium information — thus canonical ensembles if 
desired. A distributed computing framework can also be used for multi-level simulations 
such as replica exchange [100] and in principle for other quasi-parallel methods (6). 

6.1 Use of GPUs and RAM for sampling 

There are other means to exploit hardware besides parallelization. Combining a GPU 
(graphics processing unit) and CPU, implicit-solvent simulations of all-atom proteins can 
be performed hundreds of times faster than by a CPU alone [33]. Other work has exploited 
standard memory capacity (RAM) available on modern computers: by pre-calculating 
"libraries" of amino acid configurations, "library-based Monte Carlo" was shown to sample 
implicitly solvated peptides hundreds and sometimes > 1, 000 times faster than standard 
simulation [19]. In a similar spirit, tabulation of a scaled form of the generalized Born 
implicit-solvent model was shown to lead to significant speed gain for the tabulated part 
1611. 



7 Concluding Perspective 



The main goal of this review has been to array various methodologies into qualitative 
groupings so as to aid the critical analysis necessary to make progress in equilibrium 
sampling of biomolecules. Where appropriate, an effort has been made to offer a point of 
view on essential strengths and weaknesses of various methods. As an example, when 



considering popular multi-level schemes (Secs.[57T|-|531), users should be confident that 
some level of the "ladder" can be well sampled. 

Interesting conclusions result from surveying the sampling literature. Except in small 
systems, purely algorithmic improvements have yet to demonstrably accelerate equilib- 
rium sampling of biomolecules by a significant amount. Hardware-based advances have 
been more dramatic, however. In fairness, demonstrating the effectiveness of new hard- 
ware for MD is much more straightforward than assessing an algorithm. 

To aid future progress, developing and using sampling "yardsticks" should be a key 
priority for the field (Sec. |4). Such measures should probe the configuration-space dis- 
tribution in an objective, automatic way to measure the effective sample size. Once a 
small number of standard measures of sampling quality are accepted and used , efforts 
naturally will focus on approaches that make a significant difference. Currently, there is a 
proliferation of nuanced modifications of a small number of central ideas, without a good 
basis for distinguishing among them. Unlike in the history of theoretical physics, where 
elegance has sometimes served as a guide for truth, the sampling problem cries out for 
a "bottom line" focus on efficiency. After all, it is this efficiency which ultimately will permit 
addressing biophysical and biochemical questions with confidence. 

In summary, there seems little choice but to express pessimism on the current state of 
equilibrium sampling of important biomolecular systems. Keys to moving forward would 
seem to be (i) exploiting hardware; (ii) quantifying sampling to determine which algorithms 
are more efficient than MD; and (iii) employing large-resource simulation data to provide 
benchmarks and guide future efforts. 
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